Distribution of Mutual Information from Complete and Incomplete Data

نویسندگان

  • Marcus Hutter
  • Marco Zaffalon
چکیده

Mutual information is widely used, in a descriptive way, to measure the stochastic dependence of categorical random variables. In order to address questions such as the reliability of the descriptive value, one must consider sample-to-population inferential approaches. This paper deals with the posterior distribution of mutual information, as obtained in a Bayesian framework by a second-order Dirichlet prior distribution. The exact analytical expression for the mean, and analytical approximations for the variance, skewness and kurtosis are derived. These approximations have a guaranteed accuracy level of the order O(n 3 ), where n is the sample size. Leading order approximations for the mean and the variance are derived in the case of incomplete samples. The derived analytical expressions allow the distribution of mutual information to be approximated reliably and quickly. In fact, the derived expressions can be computed with the same order of complexity needed for descriptive mutual information. This makes the distribution of mutual information become a concrete alternative to descriptive mutual information in many applications which would bene t from moving to the inductive side. Some of these prospective applications are discussed, and one of them, namely feature selection, is shown to perform signi cantly better when inductive mutual information is used.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Bayesian Treatment of Incomplete Discrete Data applied to Mutual Information and Feature Selection ∗ Marcus Hutter and Marco

Given the joint chances of a pair of random variables one can compute quantities of interest, like the mutual information. The Bayesian treatment of unknown chances involves computing, from a second order prior distribution and the data likelihood, a posterior distribution of the chances. A common treatment of incomplete data is to assume ignorability and determine the chances by the expectatio...

متن کامل

Bayesian Treatment of Incomplete Discrete Data Applied to Mutual Information and Feature Selection

Given the joint chances of a pair of random variables one can compute quantities of interest, like the mutual information. The Bayesian treatment of unknown chances involves computing, from a second order prior distribution and the data likelihood, a posterior distribution of the chances. A common treatment of incomplete data is to assume ignorability and determine the chances by the expectatio...

متن کامل

Learning Bayesian Networks under the Control of Mutual Information

The extraction of the structure of a Bayesian network from data is conceived as a stochastic process of learning a random acyclic directed graph. The nodes of the graph are taken as xed but its arcs are randomly switched on or oo. The probabilities for the arcs being in the on or oo state are controlled by the amount and the precision of the mutual information the arcs transmit to the neighbori...

متن کامل

Dynamic Bayesian Information Measures

This paper introduces measures of information for Bayesian analysis when the support of data distribution is truncated progressively. The focus is on the lifetime distributions where the support is truncated at the current age t>=0. Notions of uncertainty and information are presented and operationalized by Shannon entropy, Kullback-Leibler information, and mutual information. Dynamic updatings...

متن کامل

ارزیابی عملکرد و انتخاب پرتفوی از صندوقهای سرمایهگذاری سهام

Abstract The present study aims at determining a proper decision making model for investment. In this regard, the effective criteria for evaluating the performance of mutual funds are extracted through reviewing research literature. Afterwards, the importance of each criterion (sharp, trainer, Jensen, Sortino) will be assessed through using the Shannon entropy. The study sample includes eight ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Computational Statistics & Data Analysis

دوره 48  شماره 

صفحات  -

تاریخ انتشار 2005